Search CORE

22 research outputs found

An End-to-end Neural Natural Language Interface for Databases

Author: Basik Fuat
Binnig Carsten
Cetintemel Ugur
Hättasch Benjamin
Ilkhechi Amir
Ramaswamy Shekar
Usta Arif
Utama Prasetya
Weir Nathaniel
Publication venue
Publication date: 01/01/2018
Field of study

The ability to extract insights from new data sets is critical for decision making. Visual interactive tools play an important role in data exploration since they provide non-technical users with an effective way to visually compose queries and comprehend the results. Natural language has recently gained traction as an alternative query interface to databases with the potential to enable non-expert users to formulate complex questions and information needs efficiently and effectively. However, understanding natural language questions and translating them accurately to SQL is a challenging task, and thus Natural Language Interfaces for Databases (NLIDBs) have not yet made their way into practical tools and commercial products. In this paper, we present DBPal, a novel data exploration tool with a natural language interface. DBPal leverages recent advances in deep models to make query understanding more robust in the following ways: First, DBPal uses a deep model to translate natural language statements to SQL, making the translation process more robust to paraphrasing and other linguistic variations. Second, to support the users in phrasing questions without knowing the database schema and the query features, DBPal provides a learned auto-completion model that suggests partial query extensions to users during query formulation and thus helps to write complex queries

arXiv.org e-Print Archive

TUbiblio

Towards Interactive Summarization of Large Document Collections

Author: Hättasch Benjamin
Publication venue: CEUR Workshop Proceedings
Publication date: 28/08/2018
Field of study

TUbiblio

Automated Ontology Refinement Using Compression-Based Learning

Author: Hättasch Benjamin
Publication venue
Publication date: 04/12/2017
Field of study

In this thesis, we propose an approach to refine ontologies for a given domain based on training corpora. We use the Minimum Description Length principle to assess the fit between ontology and text and to identify suitable refinement operations. For that we need to calculate a score which is based on finding a representation of the text using the ontology. We propose restrictions to the search space and introduce heuristic functions to find the representation in a reasonable amount of time. More heuristics are suggested to find modifications that improve the fit without the need to try every possible operation. We implement a framework for the refining process that contains a couple of refinement operations and can easily be extended with others. The functionality of the approach as well as the correctness of the implementation is tested with an extensive series of experiments. Synthetic data is used to confirm our hypotheses, afterwards the algorithms are applied to real data. We can also show that our system copes with large corpora containing millions of words. The resulting ontologies are evaluated using well-known metrics from ontology engineering. They could then be used in all kinds of approaches for natural language processing depending on ontologies. Additionally, we show how parts of our system can be used to solve tasks from natural language processing directly. We suggest a way how the theoretic foundation of it can be used in classification tasks and show a practical application for such a task, namely semantic topic detection

TUbiblio

WannaDB: Ad-hoc Structured Exploration of Text Collections Using Queries

Author: Hättasch Benjamin
Publication venue
Publication date: 15/09/2021
Field of study

TUbiblio

Netted?! How to Improve the Usefulness of Spider & Co.

Author: Binnig Carsten
Geisler Nadja
Hättasch Benjamin
Publication venue
Publication date: 15/09/2021
Field of study

Natural language interfaces for databases (NLIDBs) are an intuitive way to access and explore structured data. That makes challenges like Spider (Yale’s semantic parsing and text-to-SQL challenge) valuable, as they produce a series of approaches for NL-to-SQL-translation. However, the resulting contributions leave something to be desired. In this paper, we analyze the usefulness of those submissions to the leaderboard for future research. We also present a prototypical implementation called UniverSQL that makes these approaches easier to use in information access systems. We hope that this lowered barrier encourages (future) participants of these challenges to add support for actual usage of their submissions. Finally, we discuss what could be done to improve future benchmarks and shared tasks for (not only) NLIDBs

TUbiblio

DBPal: a Novel lightweight NL2SQL training pipeline

Author: Binnig Carsten
Geisler Nadja
Hättasch Benjamin
Publication venue
Publication date: 01/01/2020
Field of study

NATURAL LANGUAGE (NL) IS A PROMISING ALTERNATIVE INTERFACE TO DATABASE MANAGEMENT SYSTEMS (DBMSs) BECAUSE IT ENABLES NON-TECHNICAL USERS TO FORMULATE COMPLEX QUESTIONS. RECENTLY, DEEP LEARNING HAS GAINED TRACTION FOR TRANSLATING NATURAL LANGUAGE TO SQL. HOWEVER, THE CORE PROBLEM WITH EXISTING DEEP LEARNING APPROACHES IS THAT THEY REQUIRE AN ENORMOUS AMOUNT OF MANUALLY CURATED TRAINING DATA IN ORDER TO PROVIDE ACCURATE TRANSLATIONS. WE PRESENT DBPAL THAT USES A NOVEL TRAINING PIPELINE TO LEARN NL2SQL INTERFACES WHICH SYNTHESIZES TRAINING DATA AND, THUS, DOES NOT RELY ON MANUALLY CURATED TRAINING DATA

Hochschulschriftenserver - Universität Frankfurt am Main

ASET: Ad-hoc Structured Exploration of Text Collections

Author: Binnig Carsten
Bodensohn Jan-Micha
Hättasch Benjamin
Publication venue
Publication date: 01/01/2021
Field of study

TUbiblio

Interactive Summarization of Large Document Collections

Author: Binnig Carsten
Hättasch Benjamin
Meyer Christian M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

We present a new system for custom summarizations of large text corpora at interactive speed. The task of producing textual summaries is an important step to understand large collections of topicrelated documents and has many real-world applications in journalism, medicine, and many more. Key to our system is that the summarization model is refined by user feedback and called multiple times to improve the quality of the summarization iteratively. To that end, the human is brought into the loop to gather feedback in every iteration about which aspects of the intermediate summaries satisfy their individual information needs. Our system consists of a sampling component and a learned model to produce a textual summary. As we show in our evaluation, our system can provide a similar quality level as existing summarization models that are working on the full corpus and hence cannot provide interactive speeds

TUbiblio

Crossref

Towards Robust and Transparent Natural Language Interfaces for Databases

Author: Binnig Carsten
Brandt Christoph
Geisler Nadja
Hättasch Benjamin
Publication venue
Publication date: 14/06/2020
Field of study

In recent years the field of research on natural language interfaces for databases (NLIDBs) has progressed considerably, which can be seen by the results of challenges like Spider. However, most of these approaches concentrate on delivering best-guess answers and improving (computational) accuracy. Yet, there are still a lot of open issues regarding robustness, confidence, and transparency. Therefore, this vision paper starts to point to relevant milestones as well as corresponding opportunities for addressing them, opening up a potential path of the future development of NLIDBs

TUbiblio